Comparison of traditional regression modeling vs. AI modeling for the prediction of dental caries: a secondary data analysis

Introduction There are substantial gaps in our understanding of dental caries in primary and permanent dentition and various predictors using newer modeling methods such as Machine Learning (ML) algorithms and Artificial Intelligence (AI). The objective of this study is to compare the accuracy, precision, and differences between the caries predictive capability of AI vs. traditional multivariable regression techniques. Methods The study was conducted using secondary data stored in the Temple University Kornberg School of Dentistry electronic health records system (axiUm) of pediatric patients aged 6–16 years who were patients on record at the Pediatric Dentistry Clinic. The outcome variables considered in the study were the decayed–missing–filled teeth (DMFT) and the decayed–extracted–filled teeth (deft) scores. The predictors included age, sex, insurance, fluoride exposure, having a dental home, consumption of sugary meals, family caries experience, having special needs, visible plaque, medications reducing salivary flow, and overall assessment questions. Results The average DMFT score was 0.85 ± 2.15, while the average deft scores were 0.81 ± 2.15. For childhood dental caries, XGBoost was the best performing ML algorithm with accuracy, sensitivity. and Kappa as 81%, 84%, and 61%, respectively, followed by Support Vector Machine and Lasso Regression algorithms, both with 84% specificity. The most important variables for prediction found were age and visible plaque. Conclusions The machine learning model outperformed the traditional statistical model in the prediction of childhood dental caries. Data from a more diverse population will help improve the quality of caries prediction for permanent dentition where the traditional statistical method outperformed the machine learning model.


Introduction
Dental caries is the most prevalent oral disease affecting children and adolescents resulting in deterioration of oral health ultimately leading to tooth loss (1,2).According to the Centers for Disease Control and Prevention, about 23% of children have dental caries.Hence, it is becoming increasingly important to manage dental caries at an early age as early detection of the disease allows for a more preventive medical management (3).
Globally, the decayed-missing-filled teeth (DMFT) index has been widely accepted as a population-based measure of dental caries.DMFT along with other predictors such as demographics, potential risk, and protective factors have been strategized for individual risk assessment and as targets for caries management (3).
Utilization of various standardized checklists for the assessment of caries risk has been done for many decades now.These tools used for estimating caries risk are used in day-to-day practice for advising in the clinical decision-making process using individually tailored disease prevention (4).The foundation for successful caries management is conducting caries risk assessment (CRA).Some of the well-known and widely used Caries Risk Assessment systems include the Caries Management by Risk Assessment (CAMBRA) form, the Cariogram, the American Dental Association (ADA) checklist, and the American Academy of Pediatric Dentistry (AAPD) form (5).
Traditional multivariable regression techniques are age-old reliable methods used for caries prediction, but these traditional regression models face limitations based on the principles that need to be followed for regression modeling.These include satisfying assumptions such as independence, following normal distribution among other assumptions.Multicollinearity effect among variables cannot be as strongly studied when using regression models (6).Hence, there is a need to study the prediction of dental caries using stronger modeling techniques.Conversely, some of the newer adopted techniques include Artificial Intelligence (AI) based on Machine Learning (ML) modeling, which in comparison with the traditional approaches have shown to have higher prediction accuracy and are likely to contribute significantly to the diagnostic process.ML is a subset of AI that utilizes algorithms trained on datasets to produce models that can perform complex tasks by reducing over-/underfitting of models (1).Furthermore, AI can analyze data with a variety of features that cannot be handled by traditional regression techniques such as the ability to test the performance of a model itself (4).

Materials and methods
No prior studies comparing AI vs. traditional modeling have been conducted for predicting caries outcomes and using risk factor information from a standardized CRA system.Therefore, the aim of this study was to compare the accuracy, precision, and differences between the caries predictive capability of AI vs. traditional multivariable regression techniques in a sample of pediatric patients attending the Temple University Maurice H Kornberg School of Dentistry (TUKSoD), located in Philadelphia, Pennsylvania, United States.
This secondary data analysis research study was approved by the Institutional Review Board (IRB) at Temple University.The secondary data collected belongs to pediatric patients attending the clinics at the Dental School.

Database and inclusion/exclusion criteria
Data collection involved requesting secondary data stored in the school's electronic health records system (axiUm) of patients aged 6-16 years who were patients on records at the Pediatric Dentistry Clinic.The data retrieved from axiUm were from 1 March 2021 to 1March 2022.Other participants younger than 6 or older than 17 years of age or emergency patients were excluded from the study.

Outcomes and predictors
The outcome variables considered in the study were DMFT and the decayed-extracted-filled teeth (deft) scores.Individuals with DMFT/deft = 0 were grouped as "having no caries" and those with DMFT/deft score >0 were grouped as "having caries."The predictors included demographic variables such as age, sex (male, female, and transgender), and insurance (cash/no insurance, private insurance, Medicaid, Ryan White, and Medicare).The potential protective factors included fluoride exposure (yes/no) and having a dental home (yes/no), while potential risk factors included consumption of sugary meals (primarily at meals and frequently or for prolonged periods between meals), family caries experience (none in 24 months, lesions in last 7-23 months, and lesions in last 6 months), having special needs [no, yes (over age 14), and yes (age 6-14)], medications reducing salivary flow (yes/ no), and overall assessment questions gauging if participants were given any additional education (yes/no) or received fluoride application (yes/no) (Appendix A).

Data analysis
Descriptive statistics such as mean ± standard deviation (SD), range, and frequency were generated for all the continuous variables.DMFT and deft scores were stratified by the predictors.Bivariate tests were conducted and included Pearson correlation coefficients between DMFT/deft scores and selected continuous variables, while independent T-tests were done for predictors with one category.Analysis of variance (ANOVA) was performed on those predictors with more than two categories.DMFT and deft were dichotomized into categorical variables and appropriate Chisquare tests were conducted between various categorical predictor variables.The significance level was set at p < 0.05.
A logistic regression model was used to test for associations between predictor variables while adjusting for confounders on the categorical outcome variable.The traditional negative binomial regression model and logistic model were compared with AI modeling (Logistic Regression, XGBoost, Support Vector Machine, and Lasso Regression).The datasets were divided using resampling techniques based on fivefold cross-validation.In using this technique, 80% of the data were used for training and 20% were used for testing at each stage of the resampling.The machine learning space allowed for models to be trained, which allows them to learn about each type of unit.After the training phase, the model was assessed on the test data to predict the kind of unlabeled data.To analyze the findings, supervised learning models such as Logistic Regression, XGBoost, Support Vector Machine, and Lasso Regression were applied.Hyperparameter tuning was performed by adjusting the nrounds = 10, eta = 0.2, and objective = "binary:logistic."The predictive accuracy, precision, area under the receiver operating characteristic curve (AUC-ROC), specificity, and sensitivity of these AI models were also compared.
Dichotomizing DMFT and deft into categorical variables showed that DMFT was significantly associated with predictors such as age, consumption of sugary meals, and visible plaque.While deft was significantly associated with predictor variables such as age, family caries experience, special healthcare needs, and dental appliance use.

Traditional logistic regression
As presented in Table 4, after careful consideration of the multicollinearity effect, interaction terms, and consideration of all fitted models, we found that holding all other variables in the model constant, age was found significantly associated with DMFT [odds ratio (OR) = 0.89, 95% CI (0.27-0.38)].As age increased, the odds of having dental caries experience decreased by 11%.In addition, having insurance decreased the odds of having dental caries by 12% [OR = 0.88, 95% CI (0.032-0.62)], and those who frequently consumed sugary meals and drinks were 1.19 times more likely to have dental caries in comparison with those who did not consume sugary meals [OR = 1.19, 95% CI (0.15-0.74)].Lastly, those who had visible plaque were 1.18 times more likely to have dental caries in comparison with those who did not have plaque [OR = 1.18, 95% CI (0.09-0.78)].

AI machine learning (DMFT)
After comparison between the traditional logistic regression, ML logistic regression, XGBoost, Lasso, and Support Vector Machine methods, it was seen that the traditional logistic regression performed better in ROC AUC, accuracy, and Kappa statistic.Whereas XGBoost performed better in sensitivity measurement, and Lasso performed better in specificity measurement (Table 5).

AI machine learning (deft)
After comparison between the traditional logistic regression, ML logistic regression, XGBoost, Lasso, and Support Vector Machine methods, it was seen that the traditional logistic regression performed better in ROC AUC.XGBoost and traditional logistic regression performed equally in accuracy, whereas XGBoost performed better in sensitivity measurement and Kappa statistic.Lasso and Support Vector Machine performed equally in specificity measurement (Table 5).

Comparison of variables' importance
The assessment of variable importance was similar for both classical and machine learning algorithms.For primary dental caries, age ranked the highest followed by visible plaque and special needs.For permanent dentition, sugary meals consumption followed by plaque and insurance were considered the most valuable predictors.

Discussion
Dental caries continues to remain a huge concern for dentists, dental public health professionals, and patients as it is the most prevalent oral disease affecting children and adolescents resulting in deterioration of oral health ultimately leading to tooth loss (1).Although CRA has been highly recommended in clinical practice for management of dental caries, it is however severely underutilized by practitioners (7).Apart from CAMBRA, the other current systems are lacking validation.Therefore, development of future tools such as ML that can help utilize CRA items for better risk calculation and clinical outcomes prediction by implementing evidence-based caries management will tremendously help clinical practitioners.While many studies have been done in the past looking into dental caries predictors, these studies have been conducted using traditional statistical tools (4).There are no existing studies that have compared traditional statistical methods vs. ML in predicting dental caries.
Previous studies done on prediction of childhood dental caries showed that the presence of thick and heavy plaque is a predictor for caries development, progression, and activity (8).A study conducted by Lin et al. (9) showed that age was a useful predictor of childhood dental caries.Children with special needs were more likely to have dental caries as reported (10).Our study too found age, visible plaque, and special needs as strong predictors of dental caries comparable with previous studies conducted.Although sugary meals and fluoride exposure were not found to be significantly associated, they were part of the final model with best fit as seen in the literature (11).A previous study done on the prediction of dental caries for adolescents and adults showed poor oral hygiene and socioeconomic status as useful predictors (12).Our study showed that not having dental insurance was associated with dental caries.
Previous studies on childhood dental caries prediction utilizing ML solely found their proposed model yielded an AUC-ROC value of 0.74, sensitivity of 0.67, and accuracy of 0.64 (13).Another study (4) performed on children reported an AUC-ROC value of 0.78 for ML Logistic Regression, 0.785 for XGBoost, and 0.780 for Support Vector Machine.Our prediction data for children yielded an AUC-ROC value of 0.87 for traditional  Logistic regression and an AUC-ROC value of 0.86 for ML (Logistic Regression, Lasso, XGBoost, and Support Vector Machine) outperforming (4) AUC-ROC value.Our study also showed a sensitivity of 0.84 for XGBoost and accuracy of 0.81 for XGBoost outperforming Karhade (13) values.Some of the limitations of the current study included the lack of generalizability to other populations.Data were obtained at Temple University Pediatrics, which serves a very high proportion of low-income African Americans in an urban setting.
Our prediction data on permanent dentition yielded AUC-ROC, accuracy, and Kappa values of 0.76, 0.70, and 0.40, respectively, for traditional Logistic Regression.Our study showed a sensitivity value of 0.77 for XGBoost and specificity value of 0.70 for Lasso.No prior studies comparing traditional vs. ML have been done on dental caries prediction of adolescents and adults.

Conclusions
Understanding the predictors of dental caries plays a huge role in reducing the burden of dental caries in this population.This study contributes to reducing the gap in literature about the role of various predictors on dental caries and the utilization of ML in the prediction.Both ML and the traditional statistical tool were able to generate predictors of dental caries.However, the ML model outperformed the traditional statistical model for primary caries prediction.The ML model had better accuracy, sensitivity, specificity, and Kappa values in comparison with the traditional statistical method.Thus, with confidence we can say that ML is an accurate, precise, and more meaningful statistical method that can be used for enhancing dental caries risk assessment.Simultaneously, these predictors can be used in dayto-day practice for aiding in clinical decision-making processes and for disease prevention in individual patients.

TABLE 2
Bivariate analysis between DMFT and selected risk and protective factors (n = 912).
Reference is mentioned in bracket and significant variables are in bold.

TABLE 5
Model comparison between traditional models and ML (DMFT and deft).